In terms of the the different possible types of outsourced groups2, the numbers are as follows:
Definitely outsourced: 11%
Likely agency: 3%
High indicators: 3%
Characteristics of outsourced workers
Region
The plot below shows the proportion of workers within each region who are outsourced.3
Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (25%).
The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:
East Midlands (19%)
West Midlands (18%)
Wales (18%)
North West (17%)
Northern Ireland (16%)
We can also explore how the the entire UK workforce is distributed across the country.4 The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK’s outsourced workforce is concentrated. The regions with the highest share of the UK’s outsourced workforce are:
London (21%)
North West (11%)
South East (11%)
West Midlands (9%)
East Midlands (8%)
Region
Frequency
Sum
Percentage
London
357.35
1708.36
20.92
North West
189.39
1708.36
11.09
South East
188.47
1708.36
11.03
West Midlands
161.49
1708.36
9.45
East Midlands
140.50
1708.36
8.22
Scotland
125.82
1708.36
7.37
East of England
125.49
1708.36
7.35
South West
120.50
1708.36
7.05
Yorkshire and the Humber
119.46
1708.36
6.99
Wales
83.25
1708.36
4.87
North East
53.06
1708.36
3.11
Northern Ireland
43.56
1708.36
2.55
Sectors
Here we explore what proportion of workers in each sector are outsourced.5
The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.
The table below shows the percentage of outsourced workers in each Sector, ordered descending by percentage. It shows that the top three Sectors with the highest proportion of outsourced workers are:
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US (note that N = 31)
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
Note that for an undefined sector (‘Not found’) contained one of the largest proportions of outsourced workers (31% of workers in the ‘Not found’ category were outsourced).
A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining… and Extraterritoral organisations… all the way to 36% for Activities of households as employers, with 5 out 20 sectors having at least 20% of their workforce outsourced.
Gender
# weights: 12 (6 variable)
initial value 14077.819237
iter 10 value 7610.573378
iter 20 value 7465.550476
final value 7465.517316
converged
The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.6 Men make up 56% of the outsourced workforce compared to 47% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are 1.44 times more likely to be male than female.7
# weights: 20 (12 variable)
initial value 14077.819237
iter 10 value 7977.307669
iter 20 value 7461.899083
iter 30 value 7457.852026
iter 40 value 7457.374598
final value 7457.362521
converged
Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the ‘high indicators’ group (66.35%), followed by the ‘likely agency’ group (56.66%), followed by the ‘outsourced’ group (53.94%). Statistically speaking, compared to a not outsourced person,
Someone in the high indicators group is 2.18 times more likely to be male than female.
Someone in the likely agency group is 1.45 times more likely tobe male than female.
Someone in the outsourced group is 1.31 times more likely tobe male than female.
Additionally, people identifying as ‘Other’ gender are absent from the high indicators and likely agency groups, though given the small N (14) for this group, this finding is unlikely to be meaningful.
Pay
Note
Note, the total sample on which income analysis is based is 8943.
The number of income data points for the outsourced group is 1512
The number of income data points for the not outsourced group is 7431
The table and plot below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that outsourced workers are on average paid £2170 less than non-outsourced workers.8
Outsourcing group
n
Mean
Median
Min
Max
Standard dev.
Not outsourced
6924
26781.29
25120.67
2000
66250
13365.63
Outsourced
1367
24611.38
23061.99
2400
66108
12998.56
Call:
lm(formula = income_annual_all ~ Age + Gender + Ethnicity_collapsed +
Region + outsourcing_status + BORNUK_labelled, data = income_data,
weights = NatRepemployees)
Weighted Residuals:
Min 1Q Median 3Q Max
-56835 -7455 -394 7794 67926
Coefficients:
Estimate Std. Error t value
(Intercept) 28615.662 650.126 44.016
Age -4.219 11.015 -0.383
GenderMale 6431.482 281.637 22.836
GenderOther 1700.233 3600.117 0.472
GenderPrefer not to say 5193.288 2790.601 1.861
Ethnicity_collapsedWhite other -250.152 814.830 -0.307
Ethnicity_collapsedBlack Caribbean -15.007 1263.434 -0.012
Ethnicity_collapsedBlack African 247.019 990.262 0.249
Ethnicity_collapsedMixed other -190.924 1461.051 -0.131
Ethnicity_collapsedSouth Asian 376.945 676.298 0.557
Ethnicity_collapsedEast Asian 6260.680 1199.399 5.220
Ethnicity_collapsedOther -233.306 728.120 -0.320
Ethnicity_collapsedBlack other -1289.272 2378.790 -0.542
Ethnicity_collapsedArab 2333.453 2516.283 0.927
RegionEast Midlands -6736.409 650.609 -10.354
RegionEast of England -4647.756 612.044 -7.594
RegionNorth East -5435.261 819.669 -6.631
RegionNorth West -5007.984 592.336 -8.455
RegionNorthern Ireland -7435.196 946.405 -7.856
RegionScotland -5051.335 636.572 -7.935
RegionSouth East -4028.341 557.217 -7.229
RegionSouth West -6473.466 635.455 -10.187
RegionWales -5539.862 775.360 -7.145
RegionWest Midlands -5911.512 619.136 -9.548
RegionYorkshire and the Humber -6209.566 633.534 -9.801
outsourcing_statusOutsourced -3058.222 381.040 -8.026
BORNUK_labelledWithin the last year -3947.598 1234.217 -3.198
BORNUK_labelledWithin the last 3 years -884.418 1089.763 -0.812
BORNUK_labelledWithin the last 5 years -864.453 1205.679 -0.717
BORNUK_labelledWithin the last 10 years -381.796 954.794 -0.400
BORNUK_labelledWithin the last 15 years 621.136 1062.603 0.585
BORNUK_labelledWithin the last 20 years 2141.949 1124.228 1.905
BORNUK_labelledWithin the last 30 years 3474.708 1300.211 2.672
BORNUK_labelledMore than 30 years ago 97.072 1028.460 0.094
BORNUK_labelledPrefer not to say -2545.965 1892.576 -1.345
Pr(>|t|)
(Intercept) < 0.0000000000000002 ***
Age 0.70169
GenderMale < 0.0000000000000002 ***
GenderOther 0.63675
GenderPrefer not to say 0.06278 .
Ethnicity_collapsedWhite other 0.75885
Ethnicity_collapsedBlack Caribbean 0.99052
Ethnicity_collapsedBlack African 0.80302
Ethnicity_collapsedMixed other 0.89604
Ethnicity_collapsedSouth Asian 0.57729
Ethnicity_collapsedEast Asian 0.00000018342942382 ***
Ethnicity_collapsedOther 0.74866
Ethnicity_collapsedBlack other 0.58784
Ethnicity_collapsedArab 0.35378
RegionEast Midlands < 0.0000000000000002 ***
RegionEast of England 0.00000000000003445 ***
RegionNorth East 0.00000000003542697 ***
RegionNorth West < 0.0000000000000002 ***
RegionNorthern Ireland 0.00000000000000446 ***
RegionScotland 0.00000000000000238 ***
RegionSouth East 0.00000000000052844 ***
RegionSouth West < 0.0000000000000002 ***
RegionWales 0.00000000000097733 ***
RegionWest Midlands < 0.0000000000000002 ***
RegionYorkshire and the Humber < 0.0000000000000002 ***
outsourcing_statusOutsourced 0.00000000000000115 ***
BORNUK_labelledWithin the last year 0.00139 **
BORNUK_labelledWithin the last 3 years 0.41706
BORNUK_labelledWithin the last 5 years 0.47340
BORNUK_labelledWithin the last 10 years 0.68926
BORNUK_labelledWithin the last 15 years 0.55887
BORNUK_labelledWithin the last 20 years 0.05678 .
BORNUK_labelledWithin the last 30 years 0.00755 **
BORNUK_labelledMore than 30 years ago 0.92480
BORNUK_labelledPrefer not to say 0.17859
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12690 on 8256 degrees of freedom
(1212 observations deleted due to missingness)
Multiple R-squared: 0.09777, Adjusted R-squared: 0.09406
F-statistic: 26.31 on 34 and 8256 DF, p-value: < 0.00000000000000022
This difference increases to £3058 when we take into account Age, Gender, Ethnicity, Region, and Arrival Time. 9 This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average,
Men earn £6431 more than women.
East Asian workers earn £6261 more than White British workers.
Workers in all non-London regions earn less than workers in London
East Midlands: -£6736
East of England: -£4648
North East: -£5435
North West: -£5008
Northern Ireland: -£7435
Scotland: -£5051
South East: -£4028
Wales: -£5540
West Midlands: -£5912
Yorkshire and the Humber: -£6210
People who arrived in the UK within the last year earn £3948 less than people born in the UK
People who arrived within the last 30 years earn £3475 more than people born in the UK.
Call:
glm(formula = income_group ~ Age + Gender + Ethnicity_collapsed +
Region + outsourcing_status + BORNUK_labelled, family = "quasibinomial",
data = income_data, weights = NatRepemployees)
Coefficients:
Estimate Std. Error t value
(Intercept) -0.8387681 0.1195848 -7.014
Age 0.0072008 0.0020327 3.542
GenderMale -1.0859473 0.0542846 -20.005
GenderOther 0.0293508 0.5957292 0.049
GenderPrefer not to say -0.2536273 0.4756620 -0.533
Ethnicity_collapsedWhite other 0.0322148 0.1546676 0.208
Ethnicity_collapsedBlack Caribbean 0.0164159 0.2289889 0.072
Ethnicity_collapsedBlack African -0.0980330 0.1893781 -0.518
Ethnicity_collapsedMixed other 0.4266644 0.2519140 1.694
Ethnicity_collapsedSouth Asian -0.0842664 0.1299310 -0.649
Ethnicity_collapsedEast Asian -0.7200984 0.2663643 -2.703
Ethnicity_collapsedOther 0.1222849 0.1332352 0.918
Ethnicity_collapsedBlack other 0.1995378 0.4247030 0.470
Ethnicity_collapsedArab -0.0152309 0.4750358 -0.032
RegionEast Midlands 0.0510593 0.1181909 0.432
RegionEast of England 0.0004309 0.1128959 0.004
RegionNorth East -0.2299131 0.1549687 -1.484
RegionNorth West -0.2885660 0.1121704 -2.573
RegionNorthern Ireland 0.1391950 0.1660068 0.838
RegionScotland 0.0598447 0.1164880 0.514
RegionSouth East -0.0356854 0.1026491 -0.348
RegionSouth West -0.0651372 0.1166707 -0.558
RegionWales -0.3793558 0.1501366 -2.527
RegionWest Midlands -0.0194424 0.1146348 -0.170
RegionYorkshire and the Humber -0.0750691 0.1174169 -0.639
outsourcing_statusOutsourced 0.4099842 0.0683805 5.996
BORNUK_labelledWithin the last year 0.2638119 0.2249751 1.173
BORNUK_labelledWithin the last 3 years -0.1599420 0.2119792 -0.755
BORNUK_labelledWithin the last 5 years -0.0956817 0.2305092 -0.415
BORNUK_labelledWithin the last 10 years -0.1804185 0.1820608 -0.991
BORNUK_labelledWithin the last 15 years -0.1795682 0.2037879 -0.881
BORNUK_labelledWithin the last 20 years -0.2939555 0.2246548 -1.308
BORNUK_labelledWithin the last 30 years -0.5972754 0.2781114 -2.148
BORNUK_labelledMore than 30 years ago -0.1180208 0.1907909 -0.619
BORNUK_labelledPrefer not to say 0.6917427 0.3193442 2.166
Pr(>|t|)
(Intercept) 0.0000000000025 ***
Age 0.000399 ***
GenderMale < 0.0000000000000002 ***
GenderOther 0.960706
GenderPrefer not to say 0.593903
Ethnicity_collapsedWhite other 0.835012
Ethnicity_collapsedBlack Caribbean 0.942851
Ethnicity_collapsedBlack African 0.604711
Ethnicity_collapsedMixed other 0.090362 .
Ethnicity_collapsedSouth Asian 0.516649
Ethnicity_collapsedEast Asian 0.006877 **
Ethnicity_collapsedOther 0.358744
Ethnicity_collapsedBlack other 0.638490
Ethnicity_collapsedArab 0.974423
RegionEast Midlands 0.665748
RegionEast of England 0.996954
RegionNorth East 0.137951
RegionNorth West 0.010112 *
RegionNorthern Ireland 0.401780
RegionScotland 0.607447
RegionSouth East 0.728116
RegionSouth West 0.576655
RegionWales 0.011531 *
RegionWest Midlands 0.865327
RegionYorkshire and the Humber 0.522621
outsourcing_statusOutsourced 0.0000000021121 ***
BORNUK_labelledWithin the last year 0.240979
BORNUK_labelledWithin the last 3 years 0.450560
BORNUK_labelledWithin the last 5 years 0.678088
BORNUK_labelledWithin the last 10 years 0.321725
BORNUK_labelledWithin the last 15 years 0.378261
BORNUK_labelledWithin the last 20 years 0.190748
BORNUK_labelledWithin the last 30 years 0.031774 *
BORNUK_labelledMore than 30 years ago 0.536205
BORNUK_labelledPrefer not to say 0.030329 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasibinomial family taken to be 1.010028)
Null deviance: 9643.7 on 8290 degrees of freedom
Residual deviance: 9129.5 on 8256 degrees of freedom
(1212 observations deleted due to missingness)
AIC: NA
Number of Fisher Scoring iterations: 4
A person is more likely to be in the low income group if they are:
Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £5800.8199225 less than males. For outsourced workers, females are paid £6399.5 less than males. The difference between non-outsourced and outsourced workers is not significant.
Also see if the outsourcing * gender interaction is relevant for whether someone is low paid or not. It isn’t
Call:
glm(formula = income_group ~ Age + Ethnicity_collapsed + Region +
Gender * outsourcing_status + BORNUK_labelled, family = "quasibinomial",
data = income_data, weights = NatRepemployees)
Coefficients:
Estimate Std. Error
(Intercept) -0.836001 0.119859
Age 0.007206 0.002034
Ethnicity_collapsedWhite other 0.031301 0.154903
Ethnicity_collapsedBlack Caribbean 0.016947 0.229041
Ethnicity_collapsedBlack African -0.099353 0.189360
Ethnicity_collapsedMixed other 0.426446 0.251918
Ethnicity_collapsedSouth Asian -0.084982 0.129932
Ethnicity_collapsedEast Asian -0.718429 0.266475
Ethnicity_collapsedOther 0.120877 0.133363
Ethnicity_collapsedBlack other 0.204135 0.424644
Ethnicity_collapsedArab -0.019986 0.475153
RegionEast Midlands 0.052898 0.118287
RegionEast of England 0.002726 0.113085
RegionNorth East -0.228401 0.155101
RegionNorth West -0.286432 0.112320
RegionNorthern Ireland 0.138989 0.166454
RegionScotland 0.061414 0.116618
RegionSouth East -0.034806 0.102764
RegionSouth West -0.064468 0.116821
RegionWales -0.378192 0.150222
RegionWest Midlands -0.017539 0.114761
RegionYorkshire and the Humber -0.073335 0.117541
GenderMale -1.098130 0.060624
GenderOther -0.037706 0.683621
GenderPrefer not to say -0.316806 0.530583
outsourcing_statusOutsourced 0.380773 0.091369
BORNUK_labelledWithin the last year 0.266885 0.224952
BORNUK_labelledWithin the last 3 years -0.155708 0.212099
BORNUK_labelledWithin the last 5 years -0.096021 0.230629
BORNUK_labelledWithin the last 10 years -0.178118 0.182137
BORNUK_labelledWithin the last 15 years -0.176618 0.203862
BORNUK_labelledWithin the last 20 years -0.294419 0.224726
BORNUK_labelledWithin the last 30 years -0.596186 0.278280
BORNUK_labelledMore than 30 years ago -0.116160 0.191043
BORNUK_labelledPrefer not to say 0.690384 0.319460
GenderMale:outsourcing_statusOutsourced 0.062132 0.135496
GenderOther:outsourcing_statusOutsourced 0.304344 1.425335
GenderPrefer not to say:outsourcing_statusOutsourced 0.349368 1.219391
t value
(Intercept) -6.975
Age 3.543
Ethnicity_collapsedWhite other 0.202
Ethnicity_collapsedBlack Caribbean 0.074
Ethnicity_collapsedBlack African -0.525
Ethnicity_collapsedMixed other 1.693
Ethnicity_collapsedSouth Asian -0.654
Ethnicity_collapsedEast Asian -2.696
Ethnicity_collapsedOther 0.906
Ethnicity_collapsedBlack other 0.481
Ethnicity_collapsedArab -0.042
RegionEast Midlands 0.447
RegionEast of England 0.024
RegionNorth East -1.473
RegionNorth West -2.550
RegionNorthern Ireland 0.835
RegionScotland 0.527
RegionSouth East -0.339
RegionSouth West -0.552
RegionWales -2.518
RegionWest Midlands -0.153
RegionYorkshire and the Humber -0.624
GenderMale -18.114
GenderOther -0.055
GenderPrefer not to say -0.597
outsourcing_statusOutsourced 4.167
BORNUK_labelledWithin the last year 1.186
BORNUK_labelledWithin the last 3 years -0.734
BORNUK_labelledWithin the last 5 years -0.416
BORNUK_labelledWithin the last 10 years -0.978
BORNUK_labelledWithin the last 15 years -0.866
BORNUK_labelledWithin the last 20 years -1.310
BORNUK_labelledWithin the last 30 years -2.142
BORNUK_labelledMore than 30 years ago -0.608
BORNUK_labelledPrefer not to say 2.161
GenderMale:outsourcing_statusOutsourced 0.459
GenderOther:outsourcing_statusOutsourced 0.214
GenderPrefer not to say:outsourcing_statusOutsourced 0.287
Pr(>|t|)
(Intercept) 0.0000000000033 ***
Age 0.000397 ***
Ethnicity_collapsedWhite other 0.839866
Ethnicity_collapsedBlack Caribbean 0.941020
Ethnicity_collapsedBlack African 0.599823
Ethnicity_collapsedMixed other 0.090531 .
Ethnicity_collapsedSouth Asian 0.513098
Ethnicity_collapsedEast Asian 0.007031 **
Ethnicity_collapsedOther 0.364765
Ethnicity_collapsedBlack other 0.630728
Ethnicity_collapsedArab 0.966450
RegionEast Midlands 0.654743
RegionEast of England 0.980770
RegionNorth East 0.140899
RegionNorth West 0.010786 *
RegionNorthern Ireland 0.403742
RegionScotland 0.598469
RegionSouth East 0.734847
RegionSouth West 0.581067
RegionWales 0.011836 *
RegionWest Midlands 0.878532
RegionYorkshire and the Humber 0.532706
GenderMale < 0.0000000000000002 ***
GenderOther 0.956016
GenderPrefer not to say 0.550463
outsourcing_statusOutsourced 0.0000311236288 ***
BORNUK_labelledWithin the last year 0.235495
BORNUK_labelledWithin the last 3 years 0.462893
BORNUK_labelledWithin the last 5 years 0.677168
BORNUK_labelledWithin the last 10 years 0.328134
BORNUK_labelledWithin the last 15 years 0.386317
BORNUK_labelledWithin the last 20 years 0.190190
BORNUK_labelledWithin the last 30 years 0.032191 *
BORNUK_labelledMore than 30 years ago 0.543184
BORNUK_labelledPrefer not to say 0.030716 *
GenderMale:outsourcing_statusOutsourced 0.646568
GenderOther:outsourcing_statusOutsourced 0.830923
GenderPrefer not to say:outsourcing_statusOutsourced 0.774495
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasibinomial family taken to be 1.010833)
Null deviance: 9643.7 on 8290 degrees of freedom
Residual deviance: 9129.2 on 8253 degrees of freedom
(1212 observations deleted due to missingness)
AIC: NA
Number of Fisher Scoring iterations: 4
Notable takeaways:
There is a substantial gender pay gap present in the data. The pay gap is the same whether or not people are outsourced.
The South East is the highest-paid region after London. Northern Ireland is the lowest paid region.
People who have very recently arrived in the UK are paid less than people who were born in the UK, whilst people who migrated to the UK a long time ago earn more than people born in the UK.
Next we explore differences by outsourcing group. The table and plot below show descriptive statistics on income and its distribution for outsourced groups. Regression analysis shows that outsourced workers are on average paid £3100 less than non-outsourced workers, while no differences are evident for the likely agency and high indicators groups.11
Outsourcing group
n
Mean
Median
Min
Max
Standard dev.
Not outsourced
6924
26781.29
25120.67
2000.0
66250.00
13365.63
Outsourced
897
23680.86
22165.73
2400.0
66000.00
12783.87
Likely agency
231
25081.11
22800.00
3194.7
65846.67
13702.90
High indicators
239
27921.52
25860.36
4644.0
65000.00
12629.15
However, when controlling, as before, for Age, Gender, Ethnicity, Arrival Time, and Region,12 we find
the outsourced group on average earns £3745 less than the non-outsourced group, and
the likely agency group on average earns £2474 less than the non-outsourced group
In addition to showing that likely agency workers receive lower pay than the non-outsourced workers, this analysis reveals that “pure outsourced” workers’ pay is even lower, and that the estimate we obtained in the analysis above considering only status is a diluted effect averaging the outsourced and likely agency pay gaps.
Variations in pay
Exploring this by type of outsourced worker shows that for all sectors, the majority of outsourced workers fall into the ‘outsourced’ group.13
The next most common group after ‘outsourced’ varies by sector. Many sectors have an almost even split of likely agency and high indicator groups. Sectors that are notable for having quite large likely agency proportions relative to high indicator propottions are:
Construction
Accommodation and food service activities
Activities of households as employers (note N = 32)
In contrast, sectors with high proportion ‘high indicators’ relative to likely agency are:
Other service activities
Professional, scientific and technical activities
Real estate activities
Variations in pay
Ethnicity
People from an ethnic minority are 1.829 times more likely to be outsourced than people from a White British background; 33.09% of outsourced workers are from an ethnic minority, compared to 21.99% of non-outsourced workers.14
Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others15:
White other workers are 1.429 times more likely than White workers to be outsourced.
Black African workers are 2.752 times more likely than White workers to be outsourced.
Mixed other workers are 2.19 times more likely than White workers to be outsourced.
South Asian workers are 2.25 times more likely than White workers to be outsourced.
Other workers are 1.619 times more likely than White workers to be outsourced.
Black other workers are 2.659 times more likely than White workers to be outsourced.
Arab workers are 3.393 times more likely than White workers to be outsourced.
White other, Black Caribbean, and East Asian workers are no more or less likely than White workers to be outsourced.
# weights: 44 (30 variable)
initial value 14077.819237
iter 10 value 6005.031007
iter 20 value 5973.065586
iter 30 value 5972.625912
final value 5972.625599
converged
Breaking down by outsourcing group helps to separate out the type of outsourced work people from the ethnicities identified above engage in.16 Compared to White British workers,
White other workers are more likely to be outsourced than not outsourced
Black African workers are more likely to be outsourced, likely agency, or high indicators than not outsourced
Mixed other workers are more likely to be likely agency workers than not outsourced
South Asian workers are more likely to be outsourced, likely agency, or high indicators than not outsourced
Other workers are more likely to be outsourced or likely agency than not outsourced
Black other workers are more likely to be outsourced than not outsourced
Arab workers are more likely to be likely agency or high indicators than not outsourced
Arrival in the UK
As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. 24.13% of outsourced workers are not born in the UK, compared to 14.08% of non-outsourced workers.17 This difference is statistically significant; outsourced workers are 1.94 times more likely to have been born outside the UK than non-outsourced workers.18
Note
This variable is worded a little strangely, e.g. responses are things like “within the last 10 years”, “within the last 15 years”. Given that respondents only give one answer to this question, I think we can assume that the responses are basically brackets. That is, someone responding “within the last 15 years” is basically saying “I came to the UK between 11 and 15 years ago”.
Looking at the figure below, compared to non-outsourced people, there is a larger proportion of outsourced workers for each arrival time apart from ‘Within the last 30 years’.
# weights: 12 (6 variable)
initial value 14077.819237
iter 10 value 6002.136126
final value 6002.013178
converged
Exploring types of outsourced work indicates that the pattern observed above applies evenly to the different outsourcing groups.19 Compared to people born in the UK, people not born in the UK are:
1.97 times more likely to be outsourced than non-outsourced
1.82 times more likely to be likely agency than non-outsourced
1.93 times more likely to be high indicators than non-outsourced
The figure below indicates that the proportion of workers of each outsourcing group within each arrival time are broadly similar.
Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.20 The plot below shows that
Among workers born in the UK, a Black African worker is 2.73 times more likely to be outsourced than a White worker.
Among workers born in the UK, a South Asian worker is 2.42 times more likely to be outsourced than a White worker.
Among workers not born in the UK, a Black African worker is 1.97 times more likely to be outsourced than a White other worker.
Among White workers, someone not born in the UK is 2.32 times more likely to be outsourced than someone born in the UK.
Among Mixed other workers, someone not born in the UK is 2.84 times more likely to be outsourced than someone born in the UK.
Among Other workers, someone not born in the UK is 1.64 times more likely to be outsourced than someone born in the UK.
Put differently, being born in the UK is relevant in predicting outsourcing status only for White, Mixed other, and Other ethnicities. For other ethnicities, it doesn’t matter whether you are born in the UK or not. And compared to a White person born in the UK, Black African and South Asian workers are more likely to be outsourced whether or not they were born in the UK.
Overall, this pattern of results paints a racialised picture with strong colonial undertones. UK-born Black African and South Asian workers are more likely than UK-born White workers to be outsourced. For these and most other non-White groups, being born in the UK is not relevant for predicting outsourcing status; a Black African is just as likely to be outsourced if they arrived in the UK today than if they were born in the UK. However, the story is not one only of race. Non-UK-born White people are more likely to be outsourced than UK-born White people.
In summary, people born in the UK are more likely to be outsourced if they are Black African or South Asian compared to White, and White and (mixed) other ethnicities are more likely to be outsourced if they are not born in the UK compared to if they were born in the UK.
We next explore arrival time by collapsing responses to the arrival time question into fewer categories as below
Someone who came to the UK recently is 3.37 times more likely to be outsourced than someone born in the UK.
Someone who came to the UK not recently is 1.85 times more likely to be outsourced than someone born in the UK.
Someone who preferred to not say when they arrived is 2.32 times more likely to be outsourced than someone born in the UK.
Among East Asian workers
Someone who came to the UK not recently is 3.61 times more likely to be outsourced than someone born in the UK.
Someone who came to the UK not recently is 11.91 times more likely to be outsourced than someone who came to the UK recently
Among Other workers
Someone who came to the UK not recently is times more likely to be outsourced than someone born in the UK.
In summary,
White outsourced workers are more likely to have not been born in the UK
East Asian and Other outsourced workers are more likely to have been in the UK a longer time (10 years plus)
UK-born Black African and South Asian workers are more likely to be outsourced than White UK-born workers, but no more or less likely to be outsourced than non-UK born Black African and South Asian workers (revise this)
Characteristics of outsourced work
Major occupations
Variations in pay
For Elementary occupations, there is a clear divergence evident in the pattern; for high income workers, being outsourced increases average income, whereas for low income workers, being outsourced decreases average income. For most other groups, being outsourced is associated with a lower income, regardless of income group.
Variations in pay
Unit occupations
Examining what unit occupations outsourced workers can be found in reveals that outsourced workers tend to be concentrated in a specific cluster of occupations.24 42% of outsourced workers are located in the top 10 most common unit occupations. The top 15 unit occupations capture over 50% of the outsourced workforce, and 76% of the outsourced workforce are captured in 30 unit occupations (out of a total of 96). These thresholds are shown in the plot below where the blue lines intersect the red curve.
The top 10 unit occupations for outsourced workers are:
Functional Managers and Directors
Sales Assistants and Retail Cashiers
Caring Personal Services
Other Administrative Occupations
Information Technology Professionals
Elementary Cleaning Occupations
Teaching Professionals
Other Elementary Services Occupations
Road Transport Drivers
Nursing Professionals
These occupations differ in the extent to which outsourced workers are low paid.25 The 5 occupations with the highest proportion of low paid outsourced workers are:
---title: "Key findings final"author: - Jolyon Miles-Wilson - Celestin Okorojidate: "`r format(Sys.time(), '%e %B %Y')`"format: html: self-contained: true code-fold: true code-tools: true code-summary: "Code for Nerds" toc: true toc-depth: 5editor: visualexecute: echo: false warning: false---```{r packages}library(haven)library(poLCA)library(Hmisc)library(dplyr)library(ggplot2)library(tidyr)library(skimr)library(kableExtra)#library(MASS)library(wesanderson)library(ggrepel)library(here)library(emmeans)#library(devtools)#install_version("sjstats", version = "0.18.2")library(sjstats)library(readr)library(sjPlot)library(nnet)``````{r palette}rm(list = ls())options(scipen = 999)colours <- wes_palette("GrandBudapest2",4,"discrete")better_colours <- c('#8dd3c7','#bebada','#fb8072','#80b1d3','#fdb462')many_colours <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928','#8dd3c7','#ffffb3','#bebada','#fb8072','#80b1d3','#fdb462','#b3de69','#fccde5','#d9d9d9','#bc80bd','#ccebc5','#ffed6f')``````{r functions}extract_glm_coefs <- function(mod, only_sig=F, decimal_places = 3){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") %>% # specify new variable to add rownames to mutate( or = round(exp(Estimate), decimal_places), .after=Estimate )}extract_lm_coefs <- function(mod, only_sig = F){ coefs <- coef(summary(mod)) if(only_sig==T){ coefs <- coefs[which(coefs[,4] < .05),] } coefs <- as_tibble(coefs, rownames="variable") # specify new variable to add rownames to }``````{r data, output=FALSE}#data <- haven::read_sav("../Data/2024-04-25 - Cleaned_Data.sav")data <- readRDS("../Data/2024-09-30 - Cleaned_Data.rds") ```# How many people are outsourced?```{r sum-outsourced}total_outsourced <- data %>% group_by(outsourcing_status) %>% summarise( Sum = sum(NatRepemployees) ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion )readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced.csv")# Create function to find nearest denominator to express as a fraction.f <- function(x) ifelse(abs(1/floor(1/x) - x) < abs(1/ceiling(1/x) - x),floor(1/x),ceiling(1/x))```**1 in `r f(total_outsourced$Proportion[which(total_outsourced$outsourcing_status=="Outsourced")])` (`r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_status=="Outsourced")], 0)`%) of UK workers are outsourced.**[^1][^1]: [outputs/data/total_outsourced.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced.csv)```{r sum-outsourcing-group}total_outsourced <- data %>% group_by(outsourcing_group) %>% summarise( Sum = sum(NatRepemployees) ) %>% mutate( Proportion = Sum / sum(Sum), Percentage = 100 * Proportion )readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced_2.csv")```In terms of the the different possible types of outsourced groups[^2], the numbers are as follows:[^2]: [outputs/data/total_outsourced_2.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/total_outsourced_2.csv)1. Definitely outsourced: `r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_group=="Outsourced")], 0)`%2. Likely agency: `r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_group=="Likely agency")], 0)`%3. High indicators: `r round(total_outsourced$Percentage[which(total_outsourced$outsourcing_group=="High indicators")], 0)`%# Characteristics of outsourced workers## RegionThe plot below shows the proportion of workers within each region who are outsourced.[^3][^3]: [outputs/data/region_stats_2.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)```{r}region_statistics_2 <- data %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region, outsourcing_status) %>%summarise(Frequency =sum(NatRepemployees),n =n(), ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) %>%rename(`Outsourcing status`= outsourcing_status ) %>%ungroup()reg_levels <- region_statistics_2 %>%filter(`Outsourcing status`=="Outsourced") %>%mutate(Region = forcats::fct_reorder(Region, Percentage, .desc=FALSE) )annotation_df <- region_statistics_2 %>%filter(`Outsourcing status`=="Not outsourced") %>%select(Region, N) %>%mutate(ypos =100 )region_statistics_2 %>%mutate(Region =factor(Region, levels =levels(reg_levels$Region)) ) %>%ggplot(., aes(Region, Percentage, fill =`Outsourcing status`)) +geom_col(colour="black") +geom_text(inherit.aes=F, data = annotation_df, aes(Region, ypos, label =paste0("N=",N)), hjust=1, nudge_y =-2) +coord_flip() +scale_fill_manual(values=many_colours) +theme_minimal()readr::write_csv(region_statistics_2, file ="../outputs/data/region_stats_2.csv")region_statistics_2_1 <- region_statistics_2 %>%filter(`Outsourcing status`=="Outsourced"& Region !="London")london_perc <- region_statistics_2[which(region_statistics_2$Region =="London"& region_statistics_2["Outsourcing status"] =="Outsourced"), "Percentage"]```Below we map the workforce composition in each region. The first map emphasises that London has the highest concentration of outsourced workers (`r round(region_statistics_2[which(region_statistics_2$Region == "London" & region_statistics_2["Outsourcing status"] == "Outsourced"), "Percentage"],0)`%).```{r}knitr::include_graphics('../outputs/figures/outsourcing_by_region.svg')```The second map excludes London so that is easier to see how the remaining regions compare. After London, the regions with the highest proportion of outsourced workers are:1. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 1), "Percentage"],0)`%)2. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 2), "Percentage"],0)`%)3. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 3), "Percentage"],0)`%)4. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 4), "Percentage"],0)`%)5. `r region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Region"]` (`r round(region_statistics_2_1[which(rank(-region_statistics_2_1$Percentage) == 5), "Percentage"],0)`%)```{r}knitr::include_graphics('../outputs/figures/outsourcing_by_region_excl_london.svg')``````{r}region_statistics_3 <- data %>%filter(outsourcing_status =="Outsourced") %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region) %>%summarise(Frequency =sum(NatRepemployees) ) %>%mutate(Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(region_statistics_3, file ="../outputs/data/region_stats_3.csv")```We can also explore how the the entire UK workforce is distributed across the country.[^4] The table and map below show the percentage of outsourced workers in each region as a proportion of the total UK workforce. They show where the UK's outsourced workforce is concentrated. The regions with the highest share of the UK's outsourced workforce are:[^4]: [outputs/data/region_stats_3.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv)1. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 1), "Percentage"],0)`%)2. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 2), "Percentage"],0)`%)3. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 3), "Percentage"],0)`%)4. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 4), "Percentage"],0)`%)5. `r region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Region"]` (`r round(region_statistics_3[which(rank(-region_statistics_3$Percentage) == 5), "Percentage"],0)`%)```{r}region_statistics_3 %>%mutate(Region = haven::as_factor(Region) ) %>%arrange(desc(Percentage)) %>% knitr::kable(.,digits =2) %>%kable_styling(full_width = F)``````{r}knitr::include_graphics('../outputs/figures/outsourcing_distribution_across_regions.svg')```## SectorsHere we explore what proportion of workers in each sector are outsourced.[^5][^5]: [outputs/data/sector_summary_3.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)```{r sector-summary-3}sector_summary_3 <- data %>% #filter(income_drop_all == 0) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), # avg_income = mean(income_annual_all, na.rm=T), # wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_3.csv")```The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.```{r sector-plot-2}plot_data <- sector_summary_3 %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), )# annotation_df <- plot_data %>%# select(SectorName_short, outsourcing_status, perc, n# mutate(annotation_df <- plot_data %>% filter(outsourcing_status == "Not outsourced") %>% select(SectorName_short, N) %>% mutate( ypos = 80 )ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_status)) + geom_col() + geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label = paste0("N = ", N)), hjust=1, nudge_y = 15) + coord_flip() + scale_fill_manual(values=many_colours) + scale_y_continuous(breaks=seq(0,100,10))# sector_key <- data.frame("number" = seq(1,length(unique(plot_data$SectorName_labelled)),1),# "Sector" = levels(plot_data$SectorName_labelled))# # sector_key %>%# kable() %>%# kable_styling(full_width = F)```The table below shows the percentage of outsourced workers in each Sector, ordered descending by percentage. It shows that the top three Sectors with the highest proportion of outsourced workers are:- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==3])` (note that N = 31)- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==4])`- `r unique(plot_data$SectorName_labelled[plot_data$SectorName==22])`Note that for an undefined sector ('Not found') contained one of the largest proportions of outsourced workers (`r round(plot_data$perc[which(plot_data$SectorName==16 & plot_data$outsourcing_status=="Outsourced")],0)`% of workers in the 'Not found' category were outsourced).A key takeaway here is that whereas the total outsourced population is 17%, this figure varies by sector, from 0% for Mining... and Extraterritoral organisations... all the way to `r round(outsourced[which(outsourced$rank==1),'perc'],0)`% for `r outsourced[which(outsourced$rank==1),'SectorName_short']`, with 5 out 20 sectors having at least 20% of their workforce outsourced.## Gender```{r}gender_statistics <- data %>%group_by(outsourcing_status, Gender) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(gender_statistics, file="../outputs/data/gender_statistics.csv")``````{r gender-outsourcing-status}mod <- multinom(Gender ~ outsourcing_status, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientsors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)coefs <- cbind(coefs, ors, p) %>% as_tibble()write_csv(coefs, file = "../outputs/data/gender_inferential_tab.csv")```The outsourced workforce consists of a greater proportion of males than the non-outsourced workforce.[^6] Men make up `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the outsourced workforce compared to `r round(gender_statistics[which(gender_statistics$outsourcing_status == "Not outsourced" & gender_statistics$Gender == "Male"),"Percentage"], 0)`% of the non-outsourced workforce. This difference is statistically significant; outsourced workers, compared to non-outsourced workers, are `r round(sig_ors['Male', 'outsourcing_statusOutsourced'], 2)` times more likely to be male than female.[^7][^6]: [outputs/data/sector_summary_3.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_3.csv)[^7]: [../outputs/data/gender_inferential_tab.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/gender_inferential_tab.csv)```{r}# gender_statistics %>%# kable() %>%# kable_styling(full_width = F)gender_statistics %>%ggplot(., aes(outsourcing_status, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group") +annotate("text", x = gender_statistics$outsourcing_status, y =99, label =paste0("N = ", gender_statistics$N), hjust=1) ``````{r}gender_statistics_2 <- data %>%group_by(outsourcing_group, Gender) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(gender_statistics_2, file="../outputs/data/gender_statistics_2.csv")``````{r gender-outsourcing-group}mod <- multinom(Gender ~ outsourcing_group, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientsors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <- cbind(coefs, ors, p) %>% as_tibble()write_csv(coefs, file = "../outputs/data/gender_inferential_tab_2.csv")```Breaking down by outsourcing group, we find that the group with the largest proportion of men in the workforce is the 'high indicators' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="High indicators" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'likely agency' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Likely agency" & Gender == "Male") %>% pull(Percentage), 2)`%), followed by the 'outsourced' group (`r round(gender_statistics_2 %>% filter(outsourcing_group=="Outsourced" & Gender == "Male") %>% pull(Percentage), 2)`%). Statistically speaking, compared to a not outsourced person,- Someone in the high indicators group is `r round(sig_ors['Male', 'outsourcing_groupHigh indicators'],2)` times more likely to be male than female.- Someone in the likely agency group is `r round(sig_ors['Male', 'outsourcing_groupLikely agency'],2)` times more likely tobe male than female.- Someone in the outsourced group is `r round(sig_ors['Male', 'outsourcing_groupOutsourced'],2)` times more likely tobe male than female.Additionally, people identifying as 'Other' gender are absent from the high indicators and likely agency groups, though given the small N (`r sum(data$Gender=="Other")`) for this group, this finding is unlikely to be meaningful.```{r}# gender_statistics_2 %>%# kable() %>%# kable_styling(full_width = F)gender_statistics_2 %>%ggplot(., aes(outsourcing_group, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group") +annotate("text", x = gender_statistics_2$outsourcing_group, y =99, label =paste0("N = ", gender_statistics_2$N), hjust=1) ```## Pay::: callout-noteNote, the total sample on which income analysis is based is `r sum(!is.na(data$income_annual_all))`.The number of income data points for the outsourced group is `r data %>% filter(outsourcing_status=="Outsourced") %>% summarise(sum(!is.na(income_annual_all))) %>% pull()`The number of income data points for the not outsourced group is `r data %>% filter(outsourcing_status=="Not outsourced") %>% summarise(sum(!is.na(income_annual_all))) %>% pull()`:::```{r income}# filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%.income_statistics <- data %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% group_by(outsourcing_status) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-status.csv")income_data <- filter(data, income_drop_all==0)mod <- lm(income_annual_all ~ outsourcing_status, income_data, weights = NatRepemployees)# summary(mod)coef_table <- extract_lm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_income_by_o-status.csv")```The table and plot below show descriptive statistics on income and its distribution for outsourced and non-outsourced people. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` less than non-outsourced workers**.[^8][^8]: [outputs/data/income_stats_o-status.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-status.csv) & [outputs/data/model_income_by_o-status.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-status.csv)```{r income-plot}knitr::kable(income_statistics, digits = 2, col.names = c("Outsourcing group", "n", "Mean", "Median", "Min", "Max", "Standard dev.")) %>% kable_styling(full_width = F)# plot the distribution of income for the two groupsdata %>% filter(income_drop_all == 0 & !is.na(income_annual_all)) %>% ggplot(., aes(outsourcing_status, income_annual_all)) + geom_violin() + geom_boxplot(width = 0.3) + geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_status, y = 6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", income_statistics$median), nudge_x = 0.1, hjust=0) + coord_cartesian(xlim=c(1,2.5)) + theme_minimal() + xlab("Outsourcing status") + ylab("Annual income") + coord_cartesian(ylim = c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) + scale_y_continuous(breaks = seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))``````{r}mod_2 <-lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + Region + outsourcing_status, income_data, weights = NatRepemployees)# summary(mod_2)mod_3 <-update(mod_2, ~.+ BORNUK_labelled) summary(mod_3)# anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <-extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_2_income_by_o-status.csv")```This difference increases to £`r abs(round(coef_table['outsourcing_statusOutsourced','Estimate'],0))` when we take into account Age, Gender, Ethnicity, Region, and Arrival Time. [^9] This analysis shows that all other variables, apart from Age, are in some way relevant to income. On average,[^9]: [outputs/data/model_2_income_by_o-status.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-status.csv)- Men earn £`r abs(round(coef_table['GenderMale','Estimate'],0))` more than women.- East Asian workers earn £`r abs(round(coef_table['Ethnicity_collapsedEast Asian','Estimate'],0))` more than White British workers.- Workers in all non-London regions earn less than workers in London - East Midlands: -£`r abs(round(coef_table['RegionEast Midlands','Estimate'],0))` - East of England: -£`r abs(round(coef_table['RegionEast of England','Estimate'],0))` - North East: -£`r abs(round(coef_table['RegionNorth East','Estimate'],0))` - North West: -£`r abs(round(coef_table['RegionNorth West','Estimate'],0))` - Northern Ireland: -£`r abs(round(coef_table['RegionNorthern Ireland','Estimate'],0))` - Scotland: -£`r abs(round(coef_table['RegionScotland','Estimate'],0))` - South East: -£`r abs(round(coef_table['RegionSouth East','Estimate'],0))` - Wales: -£`r abs(round(coef_table['RegionWales','Estimate'],0))` - West Midlands: -£`r abs(round(coef_table['RegionWest Midlands','Estimate'],0))` - Yorkshire and the Humber: -£`r abs(round(coef_table['RegionYorkshire and the Humber','Estimate'],0))`- People who arrived in the UK within the last year earn £`r abs(round(coef_table['BORNUK_labelledWithin the last year','Estimate'],0))` less than people born in the UK- People who arrived within the last 30 years earn £`r abs(round(coef_table['BORNUK_labelledWithin the last 30 years','Estimate'],0))` more than people born in the UK.### Income group^[[../outputs/data/income_group_outsourcing.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/income_group_outsourcing.csv)]```{r}# test significancemod <-glm(income_group ~ outsourcing_status, data, family="quasibinomial", weights = NatRepemployees)# summary(mod)test <-summary(mod)or <-exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]mod_2 <-glm(income_group ~ Age + Gender + Ethnicity_collapsed + Region + outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod_2)# test <- summary(mod_2)or <-exp(mod_2[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]coef_table <-extract_lm_coefs(mod_2)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_2, only_sig = T)write_csv(coef_table, file="../outputs/data/income_group_outsourcing.csv")```A person is more likely to be in the low income group if they are:- Older- Female- Prefer not to say when they arrivedAnd less likely if they are:- East Asian- Live in North West or Wales- Arrived in the UK in last 30 years### Gender pay gap[outputs/data/gender_outsourced_gap.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/gender_outsourced_gap.csv)[outputs/data/mod_gender_outsourcing.csv](outputs/data/mod_gender_outsourcing.csv)```{r gender-pay-gap-1}gender_outsourced_gap <- income_data %>% group_by(outsourcing_status, Gender) %>% summarise( n = n(), mean = weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T), median = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(.5), na.rm = T), min = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(0), na.rm = T), max = wtd.quantile(income_annual_all, w = NatRepemployees, probs = c(1), na.rm = T), stdev = sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )not_outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Not outsourced") %>% select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)outsourced_gap <- gender_outsourced_gap %>% filter(outsourcing_status == "Outsourced") %>% select(c(outsourcing_status, Gender, median)) %>% pivot_wider(names_from = "Gender", values_from = "median") %>% mutate( diff = Male - Female ) %>% pull(diff)write_csv(gender_outsourced_gap, "../outputs/data/gender_outsourced_gap.csv")```Exploring the gender pay gap by outsourcing status indicates that the pay gap does not differ depending on whether workers are outsourced our not. For non-outsourced workers, females are paid £`r not_outsourced_gap` less than males. For outsourced workers, females are paid £`r outsourced_gap` less than males. The difference between non-outsourced and outsourced workers is not significant.```{r gender-outsourcing-int}ggplot(gender_outsourced_gap, aes(outsourcing_status, median, fill = Gender)) + geom_col(position="dodge") + geom_label(aes(label=round(median,0)), position=position_dodge(width=0.9)) + theme_minimal() + ylab("Median income") + xlab("Outsourcing status")simp_mod <- lm(income_annual_all ~Gender*outsourcing_status, income_data, weights = NatRepemployees)# summary(simp_mod)mod_2 <- lm(income_annual_all ~ Age + Ethnicity_collapsed + Region + Gender*outsourcing_status, income_data, weights = NatRepemployees)# summary(mod_2)mod_3 <- update(mod_2, ~.+ BORNUK_labelled) # summary(mod_3)# anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <- extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_gender_outsourcing.csv")```Also see if the outsourcing * gender interaction is relevant for whether someone is low paid or not. It isn't```{r}mod <-glm(income_group ~ Age + Ethnicity_collapsed + Region + Gender*outsourcing_status + BORNUK_labelled, income_data, family="quasibinomial", weights = NatRepemployees)summary(mod)# test <- summary(mod)or <-exp(mod[["coefficients"]][["outsourcing_statusOutsourced"]])p <- test[["coefficients"]][2,4]coef_table <-extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, "../outputs/data/mod_gender_outsourcing_income_group.csv")```Notable takeaways:- There is a substantial gender pay gap present in the data. The pay gap is the same whether or not people are outsourced.- The South East is the highest-paid region after London. Northern Ireland is the lowest paid region.- People who have very recently arrived in the UK are paid less than people who were born in the UK, whilst people who migrated to the UK a long time ago earn more than people born in the UK.```{r}income_statistics <- data %>%filter(income_drop_all ==0&!is.na(income_annual_all)) %>%group_by(outsourcing_group) %>%summarise(n =n(),mean =weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T),median =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )readr::write_csv(income_statistics, file="../outputs/data/income_stats_o-group.csv")income_data <-filter(data, income_drop_all==0)mod <-lm(income_annual_all ~ outsourcing_group, income_data, weights = NatRepemployees)# summary(mod)coef_table <-extract_lm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/model_income_by_o-group.csv")```Next we explore differences by outsourcing group. The table and plot below show descriptive statistics on income and its distribution for outsourced groups. Regression analysis shows that **outsourced workers are on average paid £`r abs(round(coef_table['outsourcing_groupOutsourced','Estimate'],0))` less than non-outsourced workers**, while no differences are evident for the likely agency and high indicators groups.[^10][^10]: [outputs/data/income_stats_o-group.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/income_stats_o-group.csv) & [outputs/data/model_income_by_o-group.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/model_income_by_o-group.csv)```{r}knitr::kable(income_statistics, digits =2, col.names =c("Outsourcing group","n","Mean","Median","Min","Max","Standard dev.")) %>%kable_styling(full_width = F)data %>%filter(income_drop_all ==0&!is.na(income_annual_all)) %>%ggplot(., aes(outsourcing_group, income_annual_all)) +geom_violin() +geom_boxplot(width =0.3) +geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_group, y =6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", round(income_statistics$median,0),"\n N = ", income_statistics$n), nudge_x =0.1, hjust=0) +coord_cartesian(xlim=c(1,2.5)) +theme_minimal() +xlab("Outsourcing group") +ylab("Annual income") +coord_cartesian(ylim =c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) +scale_y_continuous(breaks =seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))``````{r}mod_2 <-lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + Region + outsourcing_group, income_data, weights = NatRepemployees)# summary(mod_2)mod_3 <-update(mod_2, ~.+ BORNUK_labelled) # summary(mod_3)# anova(mod_2, mod_3) # adding BORNUK improves model fitcoef_table <-extract_lm_coefs(mod_3)rownames(coef_table) <- coef_table$variablesig_coefs <-extract_glm_coefs(mod_3, only_sig = T)write_csv(coef_table, file="../outputs/data/model_2_income_by_o-group.csv")```However, when controlling, as before, for Age, Gender, Ethnicity, Arrival Time, and Region,[^11] we find[^11]: [outputs/data/model_2_income_by_o-group.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/model_2_income_by_o-group.csv)- the outsourced group on average earns **£`r abs(round(coef_table['outsourcing_groupOutsourced','Estimate'],0))` less** than the non-outsourced group, and- the likely agency group on average earns **£`r abs(round(coef_table['outsourcing_groupLikely agency','Estimate'],0))` less** than the non-outsourced groupIn addition to showing that likely agency workers receive lower pay than the non-outsourced workers, this analysis reveals that "pure outsourced" workers' pay is even lower, and that the estimate we obtained in the analysis above considering only status is a diluted effect averaging the outsourced and likely agency pay gaps.<!-- ## Pay differences --><!-- ```{r} --><!-- mod <- lm(income_annual_all ~ Age + Gender + Ethnicity_collapsed + BORNUK_labelled + Region + SectorName_short, income_data, weights = NatRepemployees) --><!-- summary(mod) --><!-- ``` --><!-- ```{r} --><!-- mod <- lm(income_annual_all ~ SectorName_short*outsourcing_status, income_data, weights = NatRepemployees) --><!-- summary(mod) --><!-- ``` --><!-- ```{r} --><!-- mod <- lm(income_annual_all ~ SectorName_short*outsourcing_group, income_data, weights = NatRepemployees) --><!-- summary(mod) --><!-- ## work out how to just plot certain levels! ## --><!-- sjPlot::plot_model(mod, type = "pred", legend.title="", terms = c("SectorName_short","outsourcing_group"), dodge=1) + --><!-- coord_flip() --><!-- sig_coefs <- extract_lm_coefs(mod, only_sig = T) #, decimal_places = 10)# --><!-- ``` -->### Variations in pay```{r sector-bubble}sector_summary_pay <- data %>% filter(income_drop_all == 0) %>% group_by(SectorName, SectorName_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_pay, file="../outputs/data/sector_summary_pay.csv")plot_data <- sector_summary_pay %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), )annotation_df <- plot_data %>% filter(outsourcing_status == "Not outsourced") %>% select(SectorName_short, N) %>% group_by(SectorName_short) %>% summarise( N = sum(N) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce")``````{r}sector_summary_paysplit <- data %>%filter(income_drop_all ==0) %>%group_by(SectorName, SectorName_labelled, income_group, outsourcing_status) %>%drop_na(income_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_annual_all, na.rm=T),wtd_avg_income =weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName, income_group) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_paysplit, file="../outputs/data/sector_summary_paysplit_o-status.csv")``````{r}plot_data <- sector_summary_paysplit %>%drop_na(SectorName_short) %>%droplevels() %>%ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>%filter(outsourcing_status =='Not outsourced') %>%mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc =TRUE))outsourced <- plot_data %>%filter(outsourcing_status =='Outsourced') %>%mutate(rank =rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(SectorName_short =factor(SectorName_short, levels =levels(not_outsourced_levels$SectorName_short)), )annotation_df <- plot_data %>%filter(outsourcing_status =="Not outsourced") %>%select(SectorName_short, N) %>%group_by(SectorName_short) %>%summarise(N =sum(N) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) plot_data %>%# mutate(# SectorName = as.factor(SectorName)# ) %>%ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_status, shape = income_group)) +geom_point(position ="dodge") +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank())+#coord_flip() +scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income), 10000)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average income") +ylab("Sector") +labs(caption ="Size of bubble represents the size of the respective workforce")```<!-- To test what sectors are more likely to employ outsourced workers, it is necessary to choose a "reference" sector against which to compare other sectors. A priori there is no theoretical candidate sector that should be the reference. However, we can select a reference based on what we know about the proportion of workers in each sector. One strategy is to choose the sector that has the lowest proportion of outsourced workers. Doing so means that interpretation is something along the lines of --><!-- "compared to the sector that we know has the smallest outsourced workforce, a worker is *x* times more likely to be outsourced if they work in Sector A" --><!-- From the figure above we can see that the sectors with the lowest proportion of outsourced workers are "Mining..." and "Activities of extraterritorial...", which both have zero outsourced workers. A problem with using these is that the sample sizes are very small. The next-lowest is "Agriculture...", but here the sample is quite small too. The next-lowest is "Public administration and defence", which has an outsourced workforce of around **`r round(sector_summary_3 %>% filter(SectorName_short == "Public administration and defence" & outsourcing_status == "Outsourced") %>% pull(perc),0)`%** and a sample size of **`r sector_summary_3 %>% filter(SectorName_short == "Public administration and defence" & outsourcing_status == "Outsourced") %>% pull(N)`**. This is probably the best candidate as a reference, because it has a reliable sample size and offers a low outsourcing baseline against which to compare other sectors. It is also quite neat that this sector is basically civil service, which also distinguishes it from other sectors. -->```{r}# relevel sectorname_shortdata <- data %>%mutate(SectorName_short = forcats::fct_relevel(SectorName_short, "Public administration and defence") )``````{r include=FALSE}mod <- glm(outsourcing_status ~ SectorName_short, data, weights = NatRepemployees, family = "quasibinomial")summary(mod)coef_table <- extract_glm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T) #, decimal_places = 10)#``````{r include=FALSE}plot_model <- function(mod){ coefs <- extract_glm_coefs(mod) confints <- confint(mod) vars <- rownames(confints) confints <- confints %>% as_tibble() %>% mutate( variable = vars, .before=everything() ) %>% rename( ci_low = `2.5 %`, ci_upp = `97.5 %` ) %>% mutate( ci_low = exp(ci_low), ci_upp = exp(ci_upp) ) coef_table <- coefs %>% left_join(., confints, by = "variable") %>% filter(`Pr(>|t|)` < .05) max <- ceiling(max(coef_table$ci_upp)) p <- ggplot(coef_table, aes(variable, or)) + geom_point() + geom_errorbar(aes(ymin=ci_low, ymax=ci_upp)) + coord_flip() + geom_hline(yintercept = 1, colour = "red") + scale_y_continuous(breaks = seq(0, max, 1)) return(p)}plot_model(mod)``````{r}sector_summary <- data %>%#filter(income_drop_all == 0) %>%group_by(SectorName, SectorName_labelled, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),# avg_income = mean(income_annual_all, na.rm=T),# wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary, file ='../outputs/data/sector_summary_o-group.csv')```Exploring this by type of outsourced worker shows that for all sectors, the majority of outsourced workers fall into the 'outsourced' group.[^12][^12]: [outputs/data/sector_summary_o-group.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/sector_summary_o-group.csv)```{r}plot_data <- sector_summary %>%drop_na(SectorName_short) %>%droplevels() %>%ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>%filter(outsourcing_group =='Not outsourced') %>%mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc =TRUE))outsourced <- plot_data %>%filter(outsourcing_group =='Outsourced') %>%mutate(rank =rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(SectorName_short =factor(SectorName_short, levels =levels(not_outsourced_levels$SectorName_short)), )# annotation_df <- plot_data %>%# select(SectorName_short, outsourcing_status, perc, n# mutate(annotation_df <- plot_data %>%filter(outsourcing_group =="Not outsourced") %>%select(SectorName_short, N) %>%mutate(ypos =80 )ggplot(plot_data, aes(SectorName_short, perc, fill = outsourcing_group)) +geom_col() +geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label =paste0("N = ", N)), hjust=1, nudge_y =15) +coord_flip() +scale_fill_manual(values=many_colours) +scale_y_continuous(breaks=seq(0,100,10))```The next most common group after 'outsourced' varies by sector. Many sectors have an almost even split of likely agency and high indicator groups. Sectors that are notable for having quite large likely agency proportions relative to high indicator propottions are:- Construction- Accommodation and food service activities- Activities of households as employers (note N = 32)In contrast, sectors with high proportion 'high indicators' relative to likely agency are:- Other service activities- Professional, scientific and technical activities- Real estate activities```{r}annotation_df <- plot_data %>%filter(outsourcing_group !="Not outsourced") %>%select(SectorName_short, N) %>%mutate(ypos =20 )plot_data %>%filter(outsourcing_group !="Not outsourced") %>%ggplot(., aes(SectorName_short, perc, fill = outsourcing_group)) +geom_col(position="dodge") +geom_text(inherit.aes=F,data=annotation_df, aes(x=SectorName_short, y=ypos, label =paste0("N = ", N)), hjust=1, nudge_y =15) +coord_flip() +scale_fill_manual(values=many_colours) +labs(caption ="Note: N labels reflect total for sector including not outsourced (not shown here)")# scale_y_continuous(breaks=seq(0,100,10))```### Variations in pay```{r}sector_summary <- data %>%filter(income_drop_all ==0) %>%group_by(SectorName, SectorName_labelled, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_annual_all, na.rm=T),wtd_avg_income =weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary, file ='../outputs/data/sector_summary_o-group_pay.csv')plot_data <- sector_summary %>%drop_na(SectorName_short) %>%droplevels() %>%ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>%filter(outsourcing_group =='Not outsourced') %>%mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc =TRUE))outsourced <- plot_data %>%filter(outsourcing_group =='Outsourced') %>%mutate(rank =rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(SectorName_short =factor(SectorName_short, levels =levels(not_outsourced_levels$SectorName_short)), )annotation_df <- plot_data %>%filter(outsourcing_group =="Not outsourced") %>%select(SectorName_short, N) %>%group_by(SectorName_short) %>%summarise(N =sum(N) ) %>%mutate(ypos =max(plot_data$wtd_avg_income, na.rm=T) *1.2 ) plot_data %>%# mutate(# SectorName = as.factor(SectorName)# ) %>%ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_group)) +geom_point(position ="dodge") +theme_minimal() +theme(legend.position ="bottom",legend.title =element_blank(),#axis.text.x = element_text(angle=45, hjust=1) )+#coord_flip() +scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) +scale_colour_manual(values=colours) +geom_text(inherit.aes=F, data=annotation_df, aes(x=ypos, y=SectorName_short, label =paste0("N = ", N)), hjust=1) +guides(size=FALSE) +# remove size legend as gauging size is difficult xlab("Weighted average income") +ylab("Sector") +labs(caption ="Size of bubble represents the size of the respective workforce")``````{r}sector_summary_paysplit <- data %>%#filter(income_drop_all == 0) %>%drop_na(income_group) %>%group_by(SectorName, SectorName_labelled, income_group, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),# avg_income = mean(income_annual_all, na.rm=T),# wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_paysplit_o-group.csv")``````{r sector-paysplit-ogroup}sector_summary_paysplit <- data %>% filter(income_drop_all == 0) %>% drop_na(income_group) %>% group_by(SectorName, SectorName_labelled, income_group, outsourcing_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(SectorName) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum), SectorName_labelled = case_when(SectorName_labelled == "NA" ~ NA, TRUE ~ SectorName_labelled), SectorName_short = SectorName_labelled ) %>% # make the sector names more readable separate_wider_delim(SectorName_short, names = c("SectorName_short", "SectorName_short_detail"), delim=";", too_few = "align_start") %>% mutate( SectorName_short = factor(stringr::str_to_sentence(SectorName_short)), SectorName_short_detail = factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_paysplit_o-group_pay.csv")plot_data <- sector_summary_paysplit %>% drop_na(SectorName_short) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_group == 'Not outsourced') %>% mutate(SectorName_short = forcats::fct_reorder(SectorName_short, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_group == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( SectorName_short = factor(SectorName_short, levels = levels(not_outsourced_levels$SectorName_short)), )annotation_df <- plot_data %>% filter(outsourcing_group == "Not outsourced") %>% select(SectorName_short, N) %>% group_by(SectorName_short) %>% summarise( N = sum(N) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income,SectorName_short, size = perc, colour = outsourcing_group, shape = income_group)) + geom_point() + theme_minimal() + theme(legend.position = "bottom", legend.justification = "right", legend.title = element_blank(), #plot.margin = unit(c(1,1,1,1), "cm") ) + #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=SectorName_short, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Sector") + labs(caption = "Size of bubble represents the size of the respective workforce")```## Ethnicity```{r}ethnicity_statistics <- data %>%group_by(outsourcing_status, Ethnicity_labelled) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum),Ethnicity_short = Ethnicity_labelled ) %>%separate_wider_delim(Ethnicity_short, names =c("Ethnicity_short", "Ethnicity detail"), delim = stringr::regex(" / |, "), # use multiple delimstoo_few ="align_start",too_many ="merge")readr::write_csv(ethnicity_statistics, file ="../outputs/data/ethnicity_stats_1.csv")``````{r ethnicity_inferential, output=FALSE}ethnicities <- as.vector(unique(data$Ethnicity_labelled))non_white_ethnicities <- ethnicities[!(ethnicities %in% "English / Welsh / Scottish / Northern Irish / British")]# Will throw NA warning. I think this OK but investigate how to avoid the problemdata <- data %>% mutate( Ethnicity_binary = forcats::fct_collapse(Ethnicity_labelled, "White British" = c("English / Welsh / Scottish / Northern Irish / British"), "Non-White British" = non_white_ethnicities) )mod <- glm(outsourcing_status ~ Ethnicity_binary, data, weights = NatRepemployees, family="quasibinomial")# mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial")#summary(mod)coefs <- extract_glm_coefs(mod)write_csv(coefs, file = "../outputs/data/ethnicity_binary_o-status_inferential_tab.csv")```People from an ethnic minority are `r coefs[2, 'or']` times more likely to be outsourced than people from a White British background; `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Outsourced" & ethnicity_statistics$Ethnicity_labelled == "English / Welsh / Scottish / Northern Irish / British"), "Percentage"],2)`% of outsourced workers are from an ethnic minority, compared to `r round(100 - ethnicity_statistics[which(ethnicity_statistics$outsourcing_status == "Not outsourced" & ethnicity_statistics$Ethnicity_labelled == "English / Welsh / Scottish / Northern Irish / British"), "Percentage"],2)`% of non-outsourced workers.[^13][^13]: [outputs/data/ethnicity_stats_1.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_stats_1.csv) & [outputs/data/ethnicity_binary_o-status_inferential_tab.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_binary_o-status_inferential_tab.csv)```{r}data %>%group_by(outsourcing_status, Ethnicity_binary) %>%summarise(n =n(), # count casesFrequency =sum(NatRepemployees) # count weighted cases ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) %>%ggplot(., aes(outsourcing_status, Percentage, fill = Ethnicity_binary)) +geom_col(colour="black") +annotate("text", x = ethnicity_statistics$outsourcing_status, y =99, label =paste0("N = ",ethnicity_statistics$N), hjust=1) +coord_flip() +scale_fill_manual(values = many_colours, name ="Ethnicity") +xlab("Outsourcing group") +theme_minimal()``````{r ethnicity-status}mod <- glm(outsourcing_status ~ Ethnicity_collapsed, data, weights = NatRepemployees, family = "quasibinomial")# summary(mod)coef_table <- extract_glm_coefs(mod)rownames(coef_table) <- coef_table$variablesig_coefs <- extract_glm_coefs(mod, only_sig = T)write_csv(coef_table, file="../outputs/data/ethnicity_model_inferential.csv")```Comparison of ethnicities indicates that some groups are statistically more likely to be outsourced than others[^14]:[^14]: [outputs/data/ethnicity_model_inferential.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_model_inferential.csv)- White other workers are `r coef_table["Ethnicity_collapsedWhite other", "or"]` times more likely than White workers to be outsourced.- Black African workers are `r coef_table["Ethnicity_collapsedBlack African", "or"]` times more likely than White workers to be outsourced.- Mixed other workers are `r coef_table["Ethnicity_collapsedMixed other", "or"]` times more likely than White workers to be outsourced.- South Asian workers are `r coef_table["Ethnicity_collapsedSouth Asian", "or"]` times more likely than White workers to be outsourced.- Other workers are `r coef_table["Ethnicity_collapsedOther", "or"]` times more likely than White workers to be outsourced.- Black other workers are `r coef_table["Ethnicity_collapsedBlack other", "or"]` times more likely than White workers to be outsourced.- Arab workers are `r coef_table["Ethnicity_collapsedArab", "or"]` times more likely than White workers to be outsourced.White other, Black Caribbean, and East Asian workers are no more or less likely than White workers to be outsourced.```{r ethnicity-group}mod <- multinom(outsourcing_group ~ Ethnicity_collapsed, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <- summary(mod)$coefficientsors <- exp(coefs)colnames(ors) <- paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1 - pnorm(abs(z), 0, 1)) * 2colnames(p) <- paste(colnames(p), "p", sep="_")p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))sig_ors <- exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <- cbind(coefs, ors, p) %>% as_tibble()write_csv(coefs, file = "../outputs/data/ethnicity_ogroup_inferential_tab.csv")# sig_ors```Breaking down by outsourcing group helps to separate out the *type* of outsourced work people from the ethnicities identified above engage in.[^15] Compared to White British workers,[^15]: [outputs/data/ethnicity_ogroup_inferential_tab.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/ethnicity_ogroup_inferential_tab.csv)- White other workers are more likely to be outsourced than not outsourced- Black African workers are more likely to be outsourced, likely agency, or high indicators than not outsourced- Mixed other workers are more likely to be likely agency workers than not outsourced- South Asian workers are more likely to be outsourced, likely agency, or high indicators than not outsourced- Other workers are more likely to be outsourced or likely agency than not outsourced- Black other workers are more likely to be outsourced than not outsourced- Arab workers are more likely to be likely agency or high indicators than not outsourced```{r}sjPlot::plot_model(mod)```## Arrival in the UK```{r}bornuk_statistics <- data %>%group_by(outsourcing_status, BORNUK_labelled) %>%summarise(n =n(),Frequency =sum(NatRepemployees) ) %>%mutate(N =sum(n),Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_stats.csv")``````{r bornuk_inferential, output=FALSE}categories <- as.vector(unique(data$BORNUK_labelled))non_categories <- categories[!(categories %in% "I was born in the UK")]# Will throw NA warning. I think this OK but investigate how to avoid the problemdata <- data %>% mutate( BORNUK_binary = forcats::fct_collapse(BORNUK_labelled, "Born in UK" = "I was born in the UK", "Not born in UK" = non_categories) ) mod <- glm(outsourcing_status ~ BORNUK_binary, data, weights = NatRepemployees, family="quasibinomial")# mod <- glm(Ethnicity_binary~outsourcing_status , data, weights = NatRepemployees, family="quasibinomial")summary(mod)coefs <- extract_glm_coefs(mod)write_csv(coefs, file = "../outputs/data/bornuk_ostatus_inferential_tab.csv")```As for non-outsourced workers, the vast majority of outsourced workers are born in the UK. However, people not born in the UK are more likely to be outsourced than people born in the UK. `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Outsourced" & bornuk_statistics$BORNUK_labelled == "I was born in the UK"), "Percentage"],2)`% of outsourced workers are not born in the UK, compared to `r 100 - round(bornuk_statistics[which(bornuk_statistics$outsourcing_status == "Not outsourced" & bornuk_statistics$BORNUK_labelled == "I was born in the UK"), "Percentage"],2)`% of non-outsourced workers.[^16] This difference is statistically significant; **outsourced workers are `r round(coefs %>% filter(variable == "BORNUK_binaryNot born in UK") %>% pull(or),2)` times more likely to have been born outside the UK than non-outsourced workers.**[^17][^16]: [outputs/data/arrival_in_UK_stats.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_stats.csv)[^17]: [outputs/data/bornuk_ostatus_inferential_tab.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/bornuk_ostatus_inferential_tab.csv)::: callout-noteThis variable is worded a little strangely, e.g. responses are things like "within the last 10 years", "within the last 15 years". Given that respondents only give one answer to this question, I think we can assume that the responses are basically brackets. That is, someone responding "within the last 15 years" is basically saying "I came to the UK between 11 and 15 years ago".:::Looking at the figure below, compared to non-outsourced people, there is a larger proportion of outsourced workers for each arrival time apart from 'Within the last 30 years'.```{r}# bornuk_statistics %>%# ggplot(., aes(outsourcing_status, Percentage, fill = BORNUK_labelled)) +# geom_col(colour="black", position = "dodge") +# annotate("text", x = bornuk_statistics$outsourcing_status, y = 75, label = paste0("n=",bornuk_statistics$N)) +# coord_flip() +# scale_fill_manual(values=many_colours, name="Arrival in UK") +# theme_minimal() +# xlab("Outsourcing group") bornuk_statistics %>%ggplot(., aes(BORNUK_labelled, Percentage, fill =outsourcing_status)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_labelled, y =99, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=1) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing status") +theme_minimal() +xlab("Arrival in UK") ``````{r}mod <-multinom(outsourcing_group ~ BORNUK_binary, data, weights=NatRepemployees)#summary(mod)# get coefficients and calcualte pcoefs <-summary(mod)$coefficientsors <-exp(coefs)colnames(ors) <-paste(colnames(ors), "or", sep="_")z <- coefs/summary(mod)$standard.errorsp <- (1-pnorm(abs(z), 0, 1)) *2colnames(p) <-paste(colnames(p), "p", sep="_")p_2 <-apply(p, 2, function(x) ifelse(x <0.01, 1, NA))sig_ors <-exp(summary(mod)$coefficients * p_2)# add to table for savingcoefs <-cbind(coefs, ors, p) %>%as_tibble()write_csv(coefs, file ="../outputs/data/bornuk_ogroup_inferential_tab.csv")# sig_ors``````{r o-group}bornuk_statistics_2 <- data %>% group_by(outsourcing_group, BORNUK_labelled) %>% summarise( n = n(), Frequency = sum(NatRepemployees) ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_stats_2.csv")```Exploring *types* of outsourced work indicates that the pattern observed above applies evenly to the different outsourcing groups.[^18] Compared to people born in the UK, people not born in the UK are:[^18]: [outputs/data/bornuk_ogroup_inferential_tab.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/bornuk_ogroup_inferential_tab.csv) & [/outputs/data/arrival_in_UK_stats_2.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/arrival_in_UK_stats_2.csv)- `r round(sig_ors['Outsourced', 2],2)` times more likely to be outsourced than non-outsourced- `r round(sig_ors['Likely agency', 2],2)` times more likely to be likely agency than non-outsourced- `r round(sig_ors['High indicators', 2],2)` times more likely to be high indicators than non-outsourcedThe figure below indicates that the proportion of workers of each outsourcing group within each arrival time are broadly similar.```{r}bornuk_statistics_2 %>%ggplot(., aes(BORNUK_labelled, Percentage, fill =outsourcing_group)) +geom_col(colour="black", position ="dodge") +geom_text(aes(BORNUK_labelled, y = Percentage, label =paste0("n = ",n)), position=position_dodge(width=1), hjust=0, size =3) +coord_flip() +scale_fill_manual(values=many_colours, name="Outsourcing group") +theme_minimal() +xlab("Arrival in UK") +ylim(0,100)```## Interaction: Ethnicity and arrival time```{r}base_mod <- mod <-glm(outsourcing_status ~ Ethnicity_collapsed + BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")mod <-glm(outsourcing_status ~ Ethnicity_collapsed*BORNUK_binary, data, weights = NatRepemployees, family ="quasibinomial")# summary(mod)# check that interaction imporves the model over main effects - it doesanova(base_mod, mod, test ="F")coefs <-extract_glm_coefs(mod)``````{r}ems <-emmeans(mod, specs ="Ethnicity_collapsed", by ="BORNUK_binary")cons <-summary(contrast(ems, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )write_csv(cons, file ="../outputs/data/ethnicity_bornUK_binary_contrasts.csv")```Exploring the intersection of ethnicity and arrival time reveals some patterns whereby the likelihood of a person being outsourced is related to the combinations of ethnicity and whether they were born in the UK.[^19] The plot below shows that[^19]: [outputs/data/bornUK_binary_contrasts.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/bornUK_binary_contrasts.csv)- Among workers born in the UK, a Black African worker is `r round(sig_cons %>% filter(contrast == "White - Black African") %>% pull(or),2)` times more likely to be outsourced than a White worker.- Among workers born in the UK, a South Asian worker is `r round(sig_cons %>% filter(contrast == "White - South Asian") %>% pull(or),2)` times more likely to be outsourced than a White worker.- Among workers not born in the UK, a Black African worker is `r round(sig_cons %>% filter(contrast == "White other - Black African") %>% pull(or),2)` times more likely to be outsourced than a White other worker.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("BORNUK_binary","Ethnicity_collapsed"), dodge=0.5) +coord_flip()``````{r}ems_2 <-emmeans(mod, specs ="BORNUK_binary", by ="Ethnicity_collapsed")cons <-summary(contrast(ems_2, "pairwise",adjust="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )write_csv(cons, file ="../outputs/data/bornUK_binary_contrasts_2.csv")```Similarly, the plot below shows that[^20][^20]: [outputs/data/region_stats_2.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_2.csv)- Among White workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "White") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among Mixed other workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Mixed other") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among Other workers, someone not born in the UK is `r round(sig_cons %>% filter(contrast == "Born in UK - Not born in UK" & Ethnicity_collapsed == "Other") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("Ethnicity_collapsed","BORNUK_binary"), dodge=0.5) +coord_flip()```Put differently, being born in the UK is relevant in predicting outsourcing status only for White, Mixed other, and Other ethnicities. For other ethnicities, it doesn't matter whether you are born in the UK or not. And compared to a White person born in the UK, Black African and South Asian workers are more likely to be outsourced whether or not they were born in the UK.Overall, this pattern of results paints a racialised picture with strong colonial undertones. UK-born Black African and South Asian workers are more likely than UK-born White workers to be outsourced. For these and most other non-White groups, being born in the UK is not relevant for predicting outsourcing status; a Black African is just as likely to be outsourced if they arrived in the UK today than if they were born in the UK. However, the story is not one only of race. Non-UK-born White people are more likely to be outsourced than UK-born White people.In summary, people born in the UK are more likely to be outsourced if they are Black African or South Asian compared to White, and White and (mixed) other ethnicities are more likely to be outsourced if they are not born in the UK compared to if they were born in the UK.```{r output=FALSE}data <- data %>% mutate( BORNUK_collapsed = forcats::fct_collapse(BORNUK_labelled, "Born in UK" = "I was born in the UK", "Came to UK recently" = c("Within the last year", "Within the last 3 years", "Within the last 5 years", "Within the last 10 years"), "Came to UK not recently" = c("Within the last 15 years", "Within the last 20 years", "Within the last 30 years", "More than 30 years ago"), "Prefer not to say" = c("Prefer not to say") ) )mod <- glm(outsourcing_status ~ Ethnicity_collapsed*BORNUK_collapsed, data, family="quasibinomial", weight = NatRepemployees)#summary(mod)ems <- emmeans(mod, specs = "Ethnicity_collapsed", by = "BORNUK_collapsed")cons <- summary(contrast(ems, "pairwise", adjust = "tukey"))sig_cons <- cons %>% filter(p.value < .05) %>% mutate( or = 1 / exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )# sig_conswrite_csv(cons, file = "../outputs/data/bornUK_collapsed_contrasts.csv")```We next explore arrival time by collapsing responses to the arrival time question into fewer categories as below+-------------------------+------------------------------+| Collapsed level | Original level |+=========================+==============================+| Born in UK | - I was born in the UK |+-------------------------+------------------------------+| Came to UK recently | - Within the last year || | || | - Within the last 3 years || | || | - Within the last 5 years || | || | - Within the last 10 years |+-------------------------+------------------------------+| Came to UK not recently | - Within the last 15 years || | || | - Within the last 20 years || | || | - Within the last 30 years || | || | - More than 30 years ago |+-------------------------+------------------------------+| Prefer not to say | - Prefer not to say |+-------------------------+------------------------------+Exploring these categories[^21] confirms that[^21]: [outputs/data/region_stats_3.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/region_stats_3.csv)- Among workers born in the UK, a Black African worker is `r round(sig_cons %>% filter(contrast == "White - Black African") %>% pull(or),2)` times more likely to be outsourced than a White worker.- Among workers born in the UK, a South Asian worker is `r round(sig_cons %>% filter(contrast == "White - South Asian") %>% pull(or),2)` times more likely to be outsourced than a White worker.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("BORNUK_collapsed","Ethnicity_collapsed"), dodge=0.5) +coord_flip()``````{r}ems_2 <-emmeans(mod, specs ="BORNUK_collapsed", by ="Ethnicity_collapsed")cons <-summary(contrast(ems_2, "pairwise", adjust ="tukey"))sig_cons <- cons %>%filter(p.value < .05) %>%mutate(or =1/exp(estimate), .after=estimate # 1 / or because we want to express comparison - white(ref) (contrast expresses white(ref) - comparison) )# sig_conswrite_csv(cons, file ="../outputs/data/bornUK_collapsed_contrasts_2.csv")```And[^22][^22]: [outputs/data/bornUK_collapsed_contrasts_2.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/bornUK_collapsed_contrasts_2.csv)- Among White workers,- Someone who came to the UK recently is `r round(sig_cons %>% filter(contrast == "Born in UK - Came to UK recently" & Ethnicity_collapsed == "White") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Someone who came to the UK not recently is `r round(sig_cons %>% filter(contrast == "Born in UK - Came to UK not recently" & Ethnicity_collapsed == "White") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Someone who preferred to not say when they arrived is `r round(sig_cons %>% filter(contrast == "Born in UK - Prefer not to say" & Ethnicity_collapsed == "White") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.- Among East Asian workers - Someone who came to the UK not recently is `r round(sig_cons %>% filter(contrast == "Born in UK - Came to UK not recently" & Ethnicity_collapsed == "East Asian") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK. - Someone who came to the UK not recently is `r round(sig_cons %>% filter(contrast == "Came to UK recently - Came to UK not recently" & Ethnicity_collapsed == "East Asian") %>% pull(or),2)` times more likely to be outsourced than someone who came to the UK recently- Among Other workers - Someone who came to the UK not recently is `r round(sig_cons %>% filter(contrast == "Born in UK - Came to UK recently" & Ethnicity_collapsed == "Other") %>% pull(or),2)` times more likely to be outsourced than someone born in the UK.```{r}sjPlot::plot_model(mod, type ="pred", legend.title="", terms =c("Ethnicity_collapsed","BORNUK_collapsed"), dodge=0.5) +coord_flip()```In summary,- White outsourced workers are more likely to have not been born in the UK- East Asian and Other outsourced workers are more likely to have been in the UK a longer time (10 years plus)- UK-born Black African and South Asian workers are more likely to be outsourced than White UK-born workers, but no more or less likely to be outsourced than non-UK born Black African and South Asian workers (revise this)# Characteristics of outsourced work## Major occupations```{r MGC}data <- data %>% mutate( Majorgroupcode_labelled = na_if(Majorgroupcode_labelled, "NA") ) %>% mutate( Majorgroupcode_labelled = factor(stringr::str_to_sentence(Majorgroupcode_labelled)) )mgc_summary <- data %>% group_by(Majorgroupcode_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), # avg_income = mean(income_annual, na.rm=T), # wtd_avg_income = weighted.mean(income_annual, w = NatRepemployees, na.rm=T) ) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )readr::write_csv(mgc_summary, "../outputs/data/majorgroupcode_summary_o-status.csv")``````{r}plot_data <- mgc_summary# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>%ungroup() %>%filter(outsourcing_status =='Not outsourced') %>%mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc =TRUE))# Apply the reordered levels back to the original dataplot_data <- plot_data %>%mutate(Majorgroupcode_labelled =factor(Majorgroupcode_labelled, levels =levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>%filter(outsourcing_status =="Not outsourced") %>%drop_na(Majorgroupcode_labelled) %>%select(Majorgroupcode_labelled, N) %>%mutate(ypos =80 )plot_data %>%drop_na(Majorgroupcode_labelled) %>%ggplot(aes(Majorgroupcode_labelled, perc, fill = outsourcing_status)) +geom_col() +coord_flip() +geom_text(inherit.aes=F,data=annotation_df, aes(x=Majorgroupcode_labelled, y=ypos, label =paste0("N = ", N)), hjust=1, nudge_y =15) +scale_fill_manual(values=many_colours, name ="Outsourcing status") +ylab("Percentage") +xlab("Major group") ``````{r}mgc_summary <- data %>%group_by(Majorgroupcode_labelled, outsourcing_group) %>%summarise(n =n(),Frequency =sum(NatRepemployees),# avg_income = mean(income_annual, na.rm=T),# wtd_avg_income = weighted.mean(income_annual, w = NatRepemployees, na.rm=T) ) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum) )readr::write_csv(mgc_summary, "../outputs/data/majorgroupcode_summary_o-group.csv")```### Variations in pay```{r mgc-bubble-status}mgc_summary_pay <- data %>% filter(income_drop_all == 0) %>% group_by(Majorgroupcode_labelled, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(Majorgroupcode_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )write_csv(mgc_summary_pay, file="../outputs/data/mgc_summary_pay.csv")plot_data <- mgc_summary_pay %>% drop_na(Majorgroupcode_labelled) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_status == 'Not outsourced') %>% mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_status == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% #filter(outsourcing_status == "Not outsourced") %>% select(Majorgroupcode_labelled, n) %>% group_by(Majorgroupcode_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income, Majorgroupcode_labelled, size = perc, colour = outsourcing_status)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=Majorgroupcode_labelled, label = paste0("N = ", N)), hjust=1) + geom_text_repel(inherit.aes = F, aes(wtd_avg_income, Majorgroupcode_labelled, colour = outsourcing_status, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Major group code") + labs(caption = "Size of bubble represents the size of the respective workforce")``````{r mgc-bubble-status-2}mgc_summary_paysplit <- data %>% filter(income_drop_all == 0) %>% group_by(Majorgroupcode_labelled, income_group, outsourcing_status) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(Majorgroupcode_labelled, income_group) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )write_csv(mgc_summary_paysplit, file="../outputs/data/mgc_summary_paysplit.csv")plot_data <- mgc_summary_paysplit %>% drop_na(Majorgroupcode_labelled) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_short# not_outsourced_levels <- plot_data %>%# filter(outsourcing_status == 'Not outsourced') %>%# mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))# # outsourced <- plot_data %>%# filter(outsourcing_status == 'Outsourced') %>%# mutate(# rank = rank(desc(perc))# )# # Here use the previous ordering so this plot can be compared with previous.# # Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% # filter(outsourcing_status == "Not outsourced") %>% drop_na(income_group) %>% select(Majorgroupcode_labelled, n) %>% group_by(Majorgroupcode_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% drop_na(income_group) %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income, Majorgroupcode_labelled, size = perc, colour = outsourcing_status, shape = income_group)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text_repel(inherit.aes = F, aes(wtd_avg_income, Majorgroupcode_labelled, colour = outsourcing_status, shape = income_group, label=paste0("n=",n)), size=3) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=Majorgroupcode_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Major group code") + labs(caption = "Size of bubble represents the size of the respective workforce")```For Elementary occupations, there is a clear divergence evident in the pattern; for high income workers, **being outsourced increases average income**, whereas for low income workers, **being outsourced decreases average income**. For most other groups, being outsourced is associated with a lower income, regardless of income group.```{r mgc-o-group}plot_data <- mgc_summary# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% ungroup() %>% filter(outsourcing_group == 'Not outsourced') %>% mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% filter(outsourcing_group == "Not outsourced") %>% drop_na(Majorgroupcode_labelled) %>% select(Majorgroupcode_labelled, N) %>% mutate( ypos = 80 )plot_data %>% drop_na(Majorgroupcode_labelled) %>% ggplot(aes(Majorgroupcode_labelled, perc, fill = outsourcing_group)) + geom_col() + coord_flip() + geom_text(inherit.aes=F,data=annotation_df, aes(x=Majorgroupcode_labelled, y=ypos, label = paste0("N = ", N)), hjust=1, nudge_y = 15) + scale_fill_manual(values=many_colours, name = "Outsourcing group") + ylab("Percentage") + xlab("Major group") + theme_minimal()```### Variations in pay```{r mgc-bubble-group}mgc_summary_pay <- data %>% filter(income_drop_all == 0) %>% group_by(Majorgroupcode_labelled, outsourcing_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(Majorgroupcode_labelled) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )write_csv(mgc_summary_pay, file="../outputs/data/mgc_summary_pay_group.csv")plot_data <- mgc_summary_pay %>% drop_na(Majorgroupcode_labelled) %>% droplevels() %>% ungroup()# Filter for 'outsourced' level and reorder SectorName_shortnot_outsourced_levels <- plot_data %>% filter(outsourcing_group == 'Not outsourced') %>% mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))outsourced <- plot_data %>% filter(outsourcing_group == 'Outsourced') %>% mutate( rank = rank(desc(perc)) )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% # filter(outsourcing_status == "Not outsourced") %>% select(Majorgroupcode_labelled, n) %>% group_by(Majorgroupcode_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income, Majorgroupcode_labelled, size = perc, colour = outsourcing_group)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank())+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=Majorgroupcode_labelled, label = paste0("N = ", N)), hjust=1) + geom_text_repel(inherit.aes = F, aes(wtd_avg_income, Majorgroupcode_labelled, colour = outsourcing_group, label=paste0("n=",n)), size=3) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Major group code") + labs(caption = "Size of bubble represents the size of the respective workforce")``````{r mgc-bubble-group-2}mgc_summary_paysplit <- data %>% filter(income_drop_all == 0) %>% group_by(Majorgroupcode_labelled, income_group, outsourcing_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees), avg_income = mean(income_annual_all, na.rm=T), wtd_avg_income = weighted.mean(income_annual_all, w = NatRepemployees, na.rm=T) ) %>% ungroup() %>% group_by(Majorgroupcode_labelled, income_group) %>% mutate( N = sum(n), Sum = sum(Frequency), perc = 100 * (Frequency/Sum) )write_csv(mgc_summary_paysplit, file="../outputs/data/mgc_summary_paysplit.csv")plot_data <- mgc_summary_paysplit %>% drop_na(Majorgroupcode_labelled) %>% droplevels() %>% ungroup()# # Filter for 'outsourced' level and reorder SectorName_short# not_outsourced_levels <- plot_data %>%# filter(outsourcing_group == 'Not outsourced') %>%# mutate(Majorgroupcode_labelled = forcats::fct_reorder(Majorgroupcode_labelled, perc, .desc = TRUE))# # outsourced <- plot_data %>%# filter(outsourcing_group == 'Outsourced') %>%# mutate(# rank = rank(desc(perc))# )# Apply the reordered levels back to the original dataplot_data <- plot_data %>% mutate( Majorgroupcode_labelled = factor(Majorgroupcode_labelled, levels = levels(not_outsourced_levels$Majorgroupcode_labelled)), )annotation_df <- plot_data %>% # filter(outsourcing_status == "Not outsourced") %>% drop_na(income_group) %>% select(Majorgroupcode_labelled, n) %>% group_by(Majorgroupcode_labelled) %>% summarise( N = sum(n) ) %>% mutate( ypos = max(plot_data$wtd_avg_income, na.rm=T) * 1.2 ) plot_data %>% drop_na(income_group) %>% # mutate( # SectorName = as.factor(SectorName) # ) %>% ggplot(., aes(wtd_avg_income, Majorgroupcode_labelled, size = perc, colour = outsourcing_group, shape = income_group)) + geom_point(position = "dodge") + theme_minimal() + theme(legend.position = "bottom", legend.title = element_blank(), legend.justification = "right")+ #coord_flip() + scale_x_continuous(breaks=seq(0,max(plot_data$wtd_avg_income, na.rm=T), 10000)) + scale_colour_manual(values=colours) + geom_text_repel(inherit.aes = F, aes(wtd_avg_income, Majorgroupcode_labelled, colour = outsourcing_group, shape = income_group, label=paste0("n=",n)), size=2) + geom_text(inherit.aes=F,data=annotation_df, aes(x=ypos, y=Majorgroupcode_labelled, label = paste0("N = ", N)), hjust=1) + guides(size=FALSE) + # remove size legend as gauging size is difficult xlab("Weighted average income") + ylab("Major group code") + labs(caption = "Size of bubble represents the size of the respective workforce")```## Unit occupations```{r}unit_occ_summary <- data %>%filter(outsourcing_status =="Outsourced") %>%group_by(UnitOccupation_labelled) %>%summarise(n =n() ) %>%mutate(UnitOcc_short = UnitOccupation_labelled ) %>%# make the sector names more readableseparate_wider_delim(UnitOcc_short, names =c("UnitOcc_short", "UnitOcc_short_detail"), delim=", ",too_few ="align_start",too_many ="merge") %>%mutate(UnitOcc_short = forcats::fct_reorder(UnitOcc_short, n, .desc=FALSE),perc =100* (n /sum(n)) ) %>%arrange(perc) %>%mutate(cum_perc =100-cumsum(perc),rank =rank(-perc, ties.method ="first") ) %>%arrange(desc(perc))write_csv(unit_occ_summary, file="../outputs/data/unit_occ_summary.csv")```Examining what unit occupations outsourced workers can be found in reveals that outsourced workers tend to be concentrated in a specific cluster of occupations.[^41]`r round(unit_occ_summary %>% filter(rank == 10) %>% pull(cum_perc),0)`% of outsourced workers are located in the top 10 most common unit occupations. The top 15 unit occupations capture over 50% of the outsourced workforce, and `r round(unit_occ_summary %>% filter(rank == 30) %>%pull(cum_perc),0)`% of the outsourced workforce are captured in 30 unit occupations (out of a total of `r unit_occ_summary %>% summarise(max(rank)) %>% pull()`). These thresholds are shown in the plot below where the blue lines intersect the red curve.[^41]: [outputs/data/unit_occ_summary.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/unit_occ_summary.csv)The top 10 unit occupations for outsourced workers are:- `r unit_occ_summary %>% filter(rank == 1) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 2) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 3) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 4) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 5) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 6) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 7) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 8) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 9) %>% pull(UnitOccupation_labelled)`- `r unit_occ_summary %>% filter(rank == 10) %>% pull(UnitOccupation_labelled)````{r}#| fig-height: 10r10 <- unit_occ_summary %>%filter(rank ==10) %>%pull(UnitOcc_short)r15 <- unit_occ_summary %>%filter(rank ==15) %>%pull(UnitOcc_short)r30 <- unit_occ_summary %>%filter(rank ==30) %>%pull(UnitOcc_short)unit_occ_summary %>%ggplot(aes(n, UnitOcc_short)) +geom_col() +geom_line(aes(cum_perc, UnitOcc_short, group=1), colour="red") +labs(caption ="Bars represent number of outsourced workers.\nRed line indicates cumulative percentage of all outsourced workers") +geom_hline(yintercept = r10, colour ="blue") +geom_hline(yintercept = r15, colour ="blue") +geom_hline(yintercept = r30, colour ="blue") +theme_minimal() +xlab("Number / Cumulative percentage") +ylab("Unit Occupation") +scale_x_continuous(breaks =seq(0,max(unit_occ_summary$n),10)) +theme(# axis.text.y =element_text(size = 4) )``````{r income-group-1}# get the list of occupationsoccs <- unit_occ_summary %>% slice_head(n=10) %>% mutate(UnitOccupation_labelled = as.character(UnitOccupation_labelled) ) %>% pull(UnitOccupation_labelled)income_group_summary <- income_data %>% filter(!is.na(income_group)) %>% filter(outsourcing_status == "Outsourced") %>% filter(UnitOccupation_labelled %in% occs) %>% group_by(UnitOccupation_labelled, income_group) %>% summarise( n = n(), Frequency = sum(NatRepemployees) ) %>% mutate( N = sum(n), Sum = sum(Frequency), Percentage = 100 * (Frequency / Sum), UnitOccupation_labelled = factor(as.character(UnitOccupation_labelled)) ) %>% ungroup()write_csv(income_group_summary, "../outputs/data/unit_occ_income_group.csv")most_low_paid <- income_group_summary %>% filter(income_group=="Low") %>% arrange(desc(Percentage)) %>% slice_head(n=5) %>% mutate( UnitOccupation_labelled = as.character(UnitOccupation_labelled), Percentage = round(Percentage, 2) )```These occupations differ in the extent to which outsourced workers are low paid.^[[outputs/data/unit_occ_income_group.csv](https://github.com/Project-X-UK/jrf_nat_rep/blob/main/outputs/data/unit_occ_income_group.csv)] The 5 occupations with the highest proportion of low paid outsourced workers are: 1. `r most_low_paid[1,1] %>% pull()`: `r most_low_paid[1, "Percentage"] %>% pull()`%2. `r most_low_paid[2,1] %>% pull()`: `r most_low_paid[2, "Percentage"] %>% pull()`%3. `r most_low_paid[3,1] %>% pull()`: `r most_low_paid[3, "Percentage"] %>% pull()`%4. `r most_low_paid[4,1] %>% pull()`: `r most_low_paid[4, "Percentage"] %>% pull()`%5. `r most_low_paid[5,1] %>% pull()`: `r most_low_paid[5, "Percentage"] %>% pull()`%The plot below visualises this.```{r income-group-2}levels <- income_group_summary %>% filter(income_group == "Low") %>% arrange(Percentage) %>% pull(UnitOccupation_labelled)annotation_df <- income_group_summary %>% filter(income_group == "Low") %>% select(UnitOccupation_labelled, N) %>% mutate( ypos = 110 )income_group_summary %>% mutate( UnitOccupation_labelled = factor(UnitOccupation_labelled, levels=levels) ) %>% ggplot(aes(Percentage,UnitOccupation_labelled, fill = income_group)) + geom_col(position="dodge") + geom_text(inherit.aes=F, data = annotation_df, aes(y=UnitOccupation_labelled, x=ypos, label = paste0("N=",N)), hjust=1, nudge_x = 2) + scale_x_continuous(breaks=seq(0,100,10)) + theme_minimal() + scale_fill_manual(name = "Income group", values = better_colours) + ylab("Unit occupation") + ggtitle("Top 10 occupations by income group") + labs(caption="Includes outsourced workers only")```